Introduction

The Coachella Valley Music and Arts Festival (commonly referred to as Coachella or the Coachella Festival) is an annual music and arts festival held at the Empire Polo Club in Indio, California, located in the Inland Empire’s Coachella Valley in the Colorado Desert. Coachella is one of the largest, most famous, and most profitable music festivals in the United States and all over the world. The 2017 festival was attended by 250,000 people and grossed $114.6 million.

In order to figure out what Internet users are talking about and how they feel about Coahcella music festival, this report analyzed data collected from twitter a week after the event took place. This report mainly focus on three parts:

1.The frequency of words mentioned by users, showing by Word Cloud.

2.Visualization of sentiments towards the music festival among different hashtags and locations: whether people Coachella 2018 more than other past years’ Coachella? Whether Beyonce’s performance at Coachella 2018 is better than Coachella 2018 or Coachella event on average? Besides normal maps, there is also a shiny application which creates an interactive map on tweet popularity.

3.Statistical analysis that whether there is relationship between retweet number and sentiment score.

Data summary

I gathered 1956 observations in total (after deleting NA data points) since the geocode by Google API restricted 2500 requests per day for non-business use. (638 observations for #Coachella2018, 620 observations for #Coachella and 698 observations for #Beychella, together 1956 observations.

To have a general understanding on what are the most popular words that people use in tweets to express their sentiment on the music festival, below are the wordclouds under each topic and the total dataframe.

From the word cloud for total dataframe, we can see both positive words like: “highest”, “lol”, “winners”, and “great” and negative words like “damn”. “Beychella” is one of the most mentioned word which didn’t surprise me becasue of her extrodinary performance at Coachella 2018 and her popularity overall.

Wordcloud for all data

Wordcloud for hashtag #Coachella2018

Wordcloud for key words #Coachella

Wordcloud for hashtag #Beychella

From the above three wordcloud for each topic we can see, from the wordcloud of total data, we can see word “Beychella” is mentioned a lot. “Great” is also mentioned a lot which means people are enjoying the festival. The wordcloud for #Coachella2018 focus on words like: “great”, “stagecoach”(which is a California’s country music festival during Coachella music festival), and “different”. This means that most people enjoy the music festival, but there’re still decent amount of people have ambivalent feelings about this year’s festival by descibing it as “different”.Tweets under key words #Coachella were more positive comparing to tweets under key words #Coachella2018. More positive words like: “winners”, “invest”, and “highest”. Tweets under hashtag #Beychella doesn’t have much sentimental words. However, words like “everybodymad”(“mad” here should be excitment), “lovely”, “like”, and “crying” (here should be crying for the happiness and excitment) show that people have positive snetiment toward Beyonce’s performance on Coachella 2018. Since the three wordcloud for each data set has mostly positive sentiment tendency, we can say the sentiment of overall data is positive. I also wrote a sentiment score function to calculate the score of every text. Text with negative sentiment has negative socre, and the higher absloute value of score, the stronger sentiment it has. Below is the histogram for the overall data, we can see that the proportion of positive words is larger, and their sentiments are stronger, which means people are more likely to enjoy the music festival instead of perplex and upset by the chaos and crowd.

Map with sentiment

Having calculated sentiment scores, the question is whether there is relationship between the intensity of sentiment and people’s location, i.e. people live in the west area like California may have stronger sentiment than people from the east coast since the event took place in Indio, California. To visaulize this question, below are the maps of sentimnet score for the total data and three set seperately. Here red represents positive sentiment and blue represent regative sentiment. There seems to be more users located on the east coast than on the west coast. Also, the color for tweets from the west coast is darker, especially in south California which means people there have more positive sentiment. From the maps, we found out that the sentiment map for #Beychella has the most positive sentiment but also the most negative sentiment. This shows that some people really like Beyonce’s performance, while others may not like her performance. However, overall, most people enjoyed it.

Statistical Analysis

Summary table

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.0000  0.0000  0.0000  0.3175  1.0000  4.0000

From the summary of score we can see the average sentiment score is positive(0.3175), with minimum of -2 and maximum of 4. This means the overall sentiment is more positive, which matches the conclusion above.

ANOVA table

Below is the Anova table that analyze the relationship between sentiment and retweet number. Here p-value<0.05, which means that we should reject the null hypothesis at 95% confidence interval. Then the conclusion is that sentiment score have effect on retweet count, the stronger sentiment is, the more retweet count it would cause.

## 
## Call:
## lm(formula = total$retweetCount ~ total$absolute_score)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8981  -4202  -3407  -3301  92229 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3431.6      382.6   8.970  < 2e-16 ***
## total$absolute_score   1388.4      468.7   2.962  0.00309 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14270 on 1954 degrees of freedom
## Multiple R-squared:  0.004471,   Adjusted R-squared:  0.003961 
## F-statistic: 8.775 on 1 and 1954 DF,  p-value: 0.003091

Conclusion

From the analysis above, we can see that the sentiment of texts is different among different keywords and hashtags. Overall the data set is more positive. Also, the popularity of the text, represented by retweet counts, is related to the strength of attitude. For future inprovement, it would be better to overcome the restrict of Google geocode API and gather more data. A larger dataset will saturate this project with more factual evidences.